Skip to content

feat(sdks/kotlin): add acquireMinRemainingTtl to pool to skip near-ex…#986

Merged
Pangjiping merged 8 commits into
opensandbox-group:mainfrom
asiudgufgbukbsa:feat/pool-acquire-min-remaining-ttl
Jun 8, 2026
Merged

feat(sdks/kotlin): add acquireMinRemainingTtl to pool to skip near-ex…#986
Pangjiping merged 8 commits into
opensandbox-group:mainfrom
asiudgufgbukbsa:feat/pool-acquire-min-remaining-ttl

Conversation

@asiudgufgbukbsa
Copy link
Copy Markdown
Contributor

@asiudgufgbukbsa asiudgufgbukbsa commented Jun 5, 2026

Closes #983.

Summary

When using the distributed sandbox warm pool, pool.acquire() could return an idle sandbox with only milliseconds of remaining TTL. The subsequent checkReady polling (up to 30s) would see the sandbox expire server-side, fail with READY_TIMEOUT, and never get a chance to call renew.

Root cause: tryTakeIdle used a binary expiresAt > now check — a sandbox with 1ms remaining TTL still passed.

What changed

Acquire-time near-expiry filtering

  • New PoolConfig.acquireMinRemainingTtl: Duration. Idle entries whose remaining TTL is below this threshold are skipped on acquire and reaped on reconcile, so the chosen sandbox always has enough remaining TTL to survive checkReady + renew.
  • Default is auto-derived from idleTimeout: min(60s, idleTimeout / 2) (always strictly less than idleTimeout). Existing users with short idle timeouts are not silently broken; users with the default 24h idle timeout get a 60s threshold. Pass Duration.ZERO to opt out and restore the pre-existing binary-expiry behavior.
  • Cross-validated against idleTimeout for explicit values (build() rejects >= idleTimeout).

Discarded-alive cleanup (no resource leaks)

  • tryTakeIdle(name, minTtl) and reapExpiredIdle(name, now, minTtl) return discarded-alive IDs to the caller (TakeIdleResult in Kotlin, plain tuple in Python). Already-expired entries are silently dropped — the server has reaped them.
  • SandboxPool.acquire (Kotlin and Python sync+async) defers the kill cleanup until after the chosen candidate is connected and renewed, so slow kill RPCs cannot consume the candidate's remaining TTL. The cleanup runs on the warmup executor (Kotlin/Python sync) or via asyncio.create_task (Python async) so the caller does not block on N kill round-trips.
  • PoolReconciler routes the reaped near-expiry IDs through the existing onDiscardSandbox callback, which fires killSandbox best-effort — same path used by shrinkExcessIdle.

Warmup TTL drift fix

  • createOneSandbox / _create_one_sandbox now calls sandbox.renew(idleTimeout) after the warmup preparer runs and before returning the id. The store stamp at putIdle is now consistent with the server-side TTL, so the min-TTL gate cannot overestimate remaining TTL by the warmup duration.

Languages

  • Kotlin SDK (sandbox, sandbox-pool-redis)
  • Python SDK (sync + async, in-memory + Redis stores, both pool facades)

Behavioral change worth noting

This PR changes the default behavior of pool.acquire() for new opt-in users:

Before After
Returns any idle entry with expiresAt > now, even by 1ms Returns only entries with at least min(60s, idleTimeout/2) remaining
READY_TIMEOUT race possible Race fixed

Users who relied on the old behavior (e.g. tests with idleTimeout deliberately short) can pass acquireMinRemainingTtl(Duration.ZERO) to opt out. The auto-default is intentionally derived from idleTimeout so existing pools with idleTimeout <= 60s are not broken at construction time.

Tests

Kotlin

  • :sandbox:testInMemoryPoolStateStoreTest (alive vs expired discarded, threshold edge cases), PoolConfigTest (auto-default scaling, validation), SandboxPoolTest (builder forwarding).
  • :sandbox-pool-redis:test (gated on OPENSANDBOX_TEST_REDIS_URL) — same three behaviors against real Redis 7.
  • spotlessCheck clean.

Python

  • pytest — 207 passed (192 prior + 15 added across test_pool_store.py, test_pool_config.py, test_redis_pool_store.py, test_async_redis_pool_store.py, test_pool_kill_discarded.py).
  • New test_pool_kill_discarded.py covers: scheduling latency for both sync (executor) and async (create_task) is non-blocking; kill log fires only on confirmed success; warmup pipeline calls renew(idle_timeout) before returning the id.
  • ruff check clean.

Testing

  • Unit tests
  • Integration tests (real Redis 7)
  • e2e / manual verification (existing pool e2e tests on CI cover the change implicitly)

Breaking Changes

  • Behavioral change to pool.acquire() default (see table above). Backward-compatible escape hatch: acquireMinRemainingTtl(Duration.ZERO). Validated to not break existing pool constructors with short idleTimeout.
  • No source-incompatible API change. New methods added to PoolStateStore ship with default impls so custom user stores keep compiling.

Checklist

Commits in this PR (chronological)

  1. d921dfa7 — initial Kotlin impl (opt-in, default ZERO)
  2. 8a6e9539 — align with Add minimum remaining TTL check on pool acquire to avoid dispensing near-expiry sandboxes #983 spec, mirror to Python SDK, default 60s, cross-validation
  3. 3f7dddf9 — expose acquire_min_remaining_ttl on Python pool facades
  4. 4e4f12c8 — kill discarded-alive sandboxes, auto-scale default to min(60s, idleTimeout/2)
  5. 33ee6f3d — address @Pangjiping review (data class, dead-if removal, success-only kill log, async parallel kills)
  6. a170c1af — defer discarded-alive kill until after candidate is connected/renewed; renew warm sandbox before putIdle
  7. 8d97a64f — fix CI lint failures (ktlint KDoc parse, ruff I001/UP037/B905)

…piry idles

Closes opensandbox-group#983.

When using the distributed sandbox warm pool, `pool.acquire()` could return
an idle sandbox with only milliseconds of remaining TTL. The subsequent
`checkReady` polling (up to 30s) would see the sandbox expire server-side,
fail with READY_TIMEOUT, and never get a chance to call `renew`.

Root cause: `tryTakeIdle` used a binary `expiresAt > now` check — a sandbox
with 1ms remaining TTL still passed.

Changes (Kotlin SDK only, additive and backward-compatible):

* PoolStateStore: new overload `tryTakeIdle(poolName, minRemainingTtl)`
  with default impl that delegates to the existing single-arg method when
  the threshold is zero/negative, so custom user stores keep compiling.
* InMemoryPoolStateStore: overrides the new method to discard entries
  whose remaining TTL is below the threshold (entries are removed from
  idle membership so reconcile can replenish).
* RedisPoolStateStore: TAKE_IDLE_SCRIPT now reads the threshold from
  ARGV[1] and uses `expires_at > now_ms + min_remaining_ttl_ms`. The
  zero-arg path passes "0" so behavior is unchanged for callers.
* PoolConfig: new `acquireMinRemainingTtl: Duration` (default
  `Duration.ZERO` — pre-existing binary-expiry behavior). Validated as
  non-negative. Wired into Builder + `withMaxIdle`.
* SandboxPool.acquire: passes `config.acquireMinRemainingTtl` to the
  store. `releaseAllIdle` keeps the unfiltered call (drain-all
  semantics intentionally bypasses the TTL gate).

Tests added:

* InMemoryPoolStateStoreTest: zero/negative threshold falls back; entries
  below threshold are skipped and removed from idle; entries above pass.
* RedisPoolStateStoreTest (gated on OPENSANDBOX_TEST_REDIS_URL): same
  three behaviors against real Redis.
* PoolConfigTest: default value is `Duration.ZERO`; configured value
  round-trips; negative value is rejected.

Python and other-language SDKs share the same pattern and will be
addressed in follow-up PRs to keep this change reviewable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d921dfa779

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…thon SDK

Addresses bot review feedback on PR opensandbox-group#986 and the remaining acceptance criteria
from opensandbox-group#983 that the first revision deferred.

### Changes vs the previous revision

**Default flipped from `Duration.ZERO` to 60s** so the binary-expiry race
described in the issue is fixed out of the box, matching the issue's first
acceptance criterion. Users who deliberately want the legacy
"any-non-expired-entry" behavior can opt out by setting `Duration.ZERO`
(documented on the field).

**Cross-validated against `idleTimeout`** (`acquireMinRemainingTtl <
idleTimeout`). Without this guard a misconfigured threshold ≥ idleTimeout
would reject every freshly warmed entry and starve the pool. Caught by the
codex bot review — now refused at build() time with a descriptive message.

**Exposed on the public `SandboxPool.Builder`** (forwards to the underlying
`PoolConfig.Builder`). Without this, users of the primary builder could not
override the new threshold without dropping to the lower-level config
builder. Also caught by the codex bot review.

**Reconcile-time eviction of near-expiry entries.** The issue's optional
proposal: `reapExpiredIdle` now also reclaims entries whose remaining TTL
is below the threshold so reconcile triggers replenishment proactively
instead of waiting for full expiry. Implemented as a new
`reapExpiredIdle(poolName, now, minRemainingTtl)` overload on
`PoolStateStore` (default impl falls back to the strict-expiry sweep) so
custom user stores keep compiling. Both the in-memory and Redis stores
override it; the Redis Lua script accepts the threshold via `ARGV[1]`
with a sensible default of 0.

**Python SDK (sync + async, in-memory + Redis).** The issue labels listed
both `sdk/python` and `sdk/java`; this revision implements the same change
across all four Python store classes (`InMemoryPoolStateStore`,
`InMemoryAsyncPoolStateStore`, `RedisPoolStateStore`,
`AsyncRedisPoolStateStore`), both pool implementations, both reconcilers,
and both `PoolConfig` / `AsyncPoolConfig`.

User-defined Python stores that pre-date this change keep working: callers
go through `try_take_idle_with_min_ttl` / `reap_expired_idle_with_min_ttl`
helpers in `pool_types` that detect the new methods via `hasattr` and fall
back to the binary-expiry path otherwise.

### Tests (all green locally)

* Kotlin `:sandbox:test` and `:sandbox-pool-redis:test` — 13/13 in the
  Redis suite verified against a real Redis 7 instance via
  `OPENSANDBOX_TEST_REDIS_URL`.
* Python `pytest` — 192 passed (37 added/touched in pool tests, 0 skipped).
* New unit tests cover, for both stores in both languages: zero/negative
  threshold falls through; entries below threshold are skipped and
  removed from idle; entries above threshold are returned; new
  `reap_expired_idle_min_ttl` evicts near-expiry while preserving entries
  above the threshold.
* New PoolConfig tests cover: default is 60s; negative rejected;
  `>= idleTimeout` rejected; just-below-`idleTimeout` accepted; explicit
  `Duration.ZERO` accepted (opt-out path).

### Deferred (separate follow-up)

The codex bot also flagged that `idleTimeout` starts the server-side TTL
clock at sandbox creation, while the store's `expiresAt` is stamped after
`warmupSandboxPreparer` runs — so the store TTL can over-estimate the
real server-side remaining TTL by the warmup duration. Fixing that
correctly requires reading server-truth expiry (e.g. via
`sandbox.getInfo()`) rather than recomputing locally; treating that as a
separate PR keeps the scope of this one tight on the threshold logic.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 8a6e9539f6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread sdks/sandbox/python/src/opensandbox/sync/pool.py Outdated
Codex bot caught that ``SandboxPoolSync`` and ``SandboxPoolAsync``
constructors had explicit kwargs forwarding every other ``PoolConfig``
field, but not the newly added ``acquire_min_remaining_ttl``. With the
default 60s baked into ``PoolConfig`` validation, any pool configured
with ``idle_timeout <= 60s`` would now fail at construction time and
facade users had no way to override.

Adds the kwarg (default 60s, matching ``PoolConfig``) to both
constructors and forwards it through to the underlying config. Adds
two facade tests: building a pool with ``idle_timeout=30s`` and
``acquire_min_remaining_ttl=10s`` succeeds and the values land in the
config as written.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f7dddf9e6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

…ingTtl default

Addresses three bot reviews on PR opensandbox-group#986 head (commit 3f7dddf).

### opensandbox-group#1 + opensandbox-group#2 — Resource leak: skipped near-expiry sandboxes are now killed

Previously, when ``acquireMinRemainingTtl`` was positive, both
``tryTakeIdle`` and ``reapExpiredIdle`` silently dropped near-expiry idle
entries from store membership but never told the pool, so the live
sandboxes kept running on the server (consuming quota / cost) until their
own TTL elapsed — temporarily exceeding the intended pool size.

Redesigned the store API to surface those IDs to callers:

* New ``TakeIdleResult`` value type carrying ``sandbox_id`` plus a
  ``discarded_alive_sandbox_ids`` list. ``tryTakeIdle(name, minTtl)`` now
  returns it. Already-expired entries (server has reaped them) are
  intentionally excluded — a kill round-trip would be wasted.
* ``reapExpiredIdle(name, now, minTtl)`` now returns the alive evicted IDs
  (same exclusion rule).
* Both Lua scripts now return arrays so the discarded-alive list survives
  the round-trip; the empty-pool fast-path still returns ``nil`` so clients
  can distinguish "nothing to do" cleanly.

Wired into the pool:

* ``SandboxPool.acquire`` (Kotlin) and ``SandboxPoolSync/Async.acquire``
  (Python) call ``_kill_discarded_alive`` on the returned list. Failures
  are logged and swallowed — the primary acquire outcome must not depend
  on a janitor failure.
* ``PoolReconciler.reapExpiredIdle`` routes the returned list through the
  existing ``onDiscardSandbox`` callback, which already triggers a
  best-effort kill (same path used by ``shrinkExcessIdle``).

### opensandbox-group#3 — Default no longer breaks existing users with short ``idleTimeout``

A 60s default would fail validation (``acquireMinRemainingTtl <
idleTimeout``) for any pool configured with ``idleTimeout <= 60s``,
silently breaking user code on upgrade. Builder field is now nullable:
``null`` resolves at ``build()`` time to ``min(60s, idleTimeout / 2)``.
This is always strictly less than ``idleTimeout``, so no existing pool
construction breaks. Pass ``Duration.ZERO`` to opt out, or any explicit
positive value to override.

The strict ``< idleTimeout`` validation is retained for explicit user
values so misconfigurations (e.g. ``acquireMinRemainingTtl ==
idleTimeout``) are still caught.

### Tests

* Kotlin ``:sandbox:test`` and ``:sandbox-pool-redis:test`` (with real
  Redis 7 via ``OPENSANDBOX_TEST_REDIS_URL``) — 14/14 in the Redis suite.
* Python ``pytest`` — 197 passed.
* New cases verify, in both languages and both stores: alive entries below
  threshold are surfaced; fully-expired entries are silently dropped;
  ``reapExpiredIdle`` excludes already-expired from the alive list; the
  default scales to ``min(60s, idleTimeout/2)``; explicit ``ZERO`` opts
  out; explicit values still get validated against ``idleTimeout``.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice fix for a real race condition. A few findings — one bug, one perf concern, two nits.

Comment thread sdks/sandbox/python/src/opensandbox/sync/pool.py Outdated
Comment thread sdks/sandbox/python/src/opensandbox/pool_async.py Outdated
Four review items from @Pangjiping on 4e4f12c.

### TakeIdleResult is now a data class

Auto-generates ``equals``/``hashCode``/``toString``, so test assertions like
``assertEquals(expected, store.tryTakeIdle(...))`` work and debugger output is
useful. The ``EMPTY`` singleton optimization is preserved (data classes
permit secondary references just fine).

### PoolStateStore defaults: dead-if removed

Both ``tryTakeIdle(name, minTtl)`` and ``reapExpiredIdle(name, now, minTtl)``
default impls had identical ``if (zero-or-negative) { ... } else { ... }``
branches. The branches collapsed to a single unconditional delegate. KDoc
updated to explain why custom stores that predate the methods always see an
empty discarded-alive list.

### Python kill log fires only on success

``_kill_sandbox_best_effort`` now returns ``bool``:

- ``True`` on a confirmed kill
- ``False`` if no manager is available, or if the kill raised (swallowed)

``_kill_discarded_alive`` only emits the ``"Killed near-expiry idle sandbox"``
debug line when the return value is ``True``. Mirrors the Kotlin pattern
where the equivalent log is inside the ``try`` block after ``killSandbox``.

The existing ``Callable[[str], None]`` reconciler callback continues to
accept the method (PEP 484 ``None`` return ≈ "no return constraint").

### Async pool: concurrent kills via asyncio.gather

``SandboxPoolAsync._kill_discarded_alive`` used to ``await`` each kill
sequentially, so a batch of N near-expiry IDs blocked ``acquire()`` on N
serial RPCs. Now uses ``asyncio.gather`` so the calls overlap, then maps
the success bool back to the per-ID debug log.

### Tests added (test_pool_kill_discarded.py)

- ``_kill_sandbox_best_effort`` returns True/False correctly across
  success / failure / missing-manager
- ``_kill_discarded_alive`` debug log fires only for successful kills,
  verified with ``caplog``
- async kill batch finishes well under serial-time (5 × 50ms sleeps
  complete in <150ms) and ``max_in_flight >= 2``

### Test results

Kotlin ``:sandbox:test`` + ``:sandbox-pool-redis:test`` (real Redis 7
via ``OPENSANDBOX_TEST_REDIS_URL``): all green.
Python ``pytest``: 203 passed (197 + 6 new).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 33ee6f3dcb

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Pangjiping
Pangjiping previously approved these changes Jun 6, 2026
Copy link
Copy Markdown
Collaborator

@Pangjiping Pangjiping left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Well-structured fix for a real production race condition. Good backward-compat story and thorough test coverage across both Kotlin and Python SDKs.

Minor suggestions (non-blocking):

  1. PR description says default is Duration.ZERO but code auto-derives min(60s, idleTimeout/2) — worth updating the description.
  2. Kotlin killDiscardedAlive runs serial kill calls inside acquire() — consider background/async execution like the Python asyncio.gather path.
  3. Consider a changelog note about the default behavioral change (existing users now get 60s threshold instead of binary expiry).

None of these block merge.

@Pangjiping Pangjiping added bug Something isn't working sdks labels Jun 6, 2026
…utIdle

Two bot reviews on 33ee6f3.

### Fix A — connect to the eligible idle before cleanup

Previously ``acquire`` killed every below-threshold-but-still-alive
sandbox synchronously *before* connecting to the chosen candidate. With
slow kill RPCs or a large discarded-alive list, that cleanup could
consume the candidate's remaining TTL — so a sandbox that just passed
``acquireMinRemainingTtl`` could still expire during ``connect()`` /
``renew()``. The original race this PR was meant to fix.

Reordered ``SandboxPool.acquire`` (Kotlin and Python sync+async) to:
1. Take the candidate from the store.
2. Stash ``discardedAliveSandboxIds`` in ``pendingKill``.
3. Connect, run any user health check, ``renew`` for ``sandboxTimeout``.
4. **Then** schedule the kill cleanup, fire-and-forget:
   - Kotlin: ``warmupExecutor.submit`` (falls back to inline if executor
     is missing — better to slow the caller than drop the cleanup).
   - Python sync: ``ThreadPoolExecutor.submit`` on the warmup executor
     (same fallback).
   - Python async: ``asyncio.create_task`` registered on the existing
     ``_warmup_tasks`` set so shutdown can wait on it.

The same scheduling fires on the FAIL_FAST / DIRECT_CREATE fall-through
paths so discarded-alive sandboxes never linger silently.

### Fix B — renew warm sandbox before recording idle expiry

``putIdle`` stamps store-side expiry as ``putIdle_time + idleTimeout``,
but the server-side TTL has been ticking since
``buildSandboxFromSpec()`` /``_build_sandbox_from_spec()`` — through the
readiness wait *and* any ``warmupSandboxPreparer`` work. With a long
preparer (e.g. 45s with ``idleTimeout=120s`` and
``acquireMinRemainingTtl=60s``), the store can report >60s remaining
while the server has far less, letting the min-TTL gate hand out a
sandbox that still expires during acquire/connect.

Added one ``sandbox.renew(config.idleTimeout)`` call right before
returning the id from ``createOneSandbox`` / ``_create_one_sandbox``,
so the server-side TTL is reset to match the store's stamp. Kotlin and
Python sync+async all updated.

### Tests added (test_pool_kill_discarded.py)

* ``_schedule_kill_discarded_alive`` returns immediately for both sync
  (executor-backed) and async (``create_task``); verified by submitting
  a 100ms-delay manager and asserting <50ms / <10ms scheduling latency.
  The kills still complete on the executor / event loop afterwards.
* ``_create_one_sandbox`` calls ``renew(idle_timeout)`` exactly once
  before returning the id — covered for both sync and async pools using
  a ``RenewTracking*Sandbox`` fake that records ``renew`` calls.

### Test results

Kotlin ``:sandbox:test`` + ``:sandbox-pool-redis:test`` (real Redis 7
via ``OPENSANDBOX_TEST_REDIS_URL``): all green.
Python ``pytest``: 207 passed (203 + 4 new).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
CI's SDK Tests failed two quality jobs on the previous head:

- ``Kotlin SDK Quality And Tests (sandbox)`` — ``spotlessCheck`` (ktlint
  parser) errored on a KDoc that contained an unbalanced literal
  ``{...}`` followed by ``"..."``. Rewrote the comment as plain prose so
  the parser stops trying to resolve braces inside the docstring. Also
  picked up the Spotless auto-format pass on a few unrelated lines in
  ``InMemoryPoolStateStore`` / ``TakeIdleResult`` (whitespace + line
  break normalization).

- ``Python SDK Quality (sandbox)`` — ``ruff check`` flagged two
  categories from this PR:
  * ``I001`` import ordering in the new helpers in
    ``pool_types`` / sync+async reconcilers / sync+async pool. Fixed
    via ``ruff --fix`` (split into two import groups, alphabetized).
  * ``UP037`` quoted self-referential type annotations in the new
    ``_RenewTracking{Sync,Async}Sandbox`` test fakes. Quotes removed.
  * ``B905`` ``zip()`` without ``strict=`` in the new async
    ``_kill_discarded_alive`` parallel kill loop. Now ``strict=True``.

No source-level behavior change. Existing Python suite still 207/207;
Kotlin ``:sandbox:test`` + ``:sandbox-pool-redis:test`` (real Redis 7
via ``OPENSANDBOX_TEST_REDIS_URL``) green; ``spotlessCheck`` and
``ruff check`` both clean locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@asiudgufgbukbsa
Copy link
Copy Markdown
Contributor Author

Thanks for the thorough review @Pangjiping! 🙏

Quick status on the three non-blocking suggestions:

1. PR description outdated — fixed (updated the body just now). Reflects the auto-derived default min(60s, idleTimeout/2), the discarded-alive kill design, the warmup-renew fix, and lists all 7 commits with links.

2. Kotlin killDiscardedAlive runs serial kill calls inside acquire() — this was already addressed in a170c1af (pushed after the LGTM, hence likely missed). SandboxPool.acquire now calls scheduleKillDiscardedAlive(...) after the candidate is connected and renewed, and the cleanup is offloaded to warmupExecutor.submit(...) so the caller does not wait on N kill RPCs (mirrors the Python asyncio.gather / create_task path). Falls back to inline only when no executor is available, e.g. mid-shutdown — better to slow the caller than drop the cleanup. Verified by test_sync_schedule_kill_discarded_alive_does_not_block_caller on the Python side; the Kotlin side is structurally equivalent.

3. Changelog note on default behavioral change — the project doesn't carry per-SDK CHANGELOG files yet (only server/, components/*), so I added a clearly-marked Behavioral change worth noting section to the PR description with a before/after table and the Duration.ZERO opt-out path. If maintainers want it captured elsewhere (e.g. release notes via the PR title prefix or a dedicated doc), happy to add it where it fits the project's release tooling.

Recap of all changes since the dismissed LGTM review

  • a170c1af — Fix A (defer kill until after candidate connect/renew) + Fix B (renew warm sandbox to idleTimeout before putIdle so store stamp matches server-side TTL); 4 new tests.
  • 8d97a64f — fix CI lint failures: ktlint KDoc parse error in RedisPoolStateStore (rewrote a docstring containing literal {...} that confused the parser), ruff I001 import order in pool helpers, UP037 quoted self-references in test fakes, B905 zip(... strict=True) in async kill loop.

Local verification on 8d97a64f:

  • Kotlin :sandbox:test + :sandbox-pool-redis:test (real Redis 7 via OPENSANDBOX_TEST_REDIS_URL) + spotlessCheck — all green.
  • Python pytest — 207 passed; ruff check — clean.

CI jobs on the new head are currently action_required (first-time-contributor gate) — would appreciate a workflow approval when you have a moment so the rerun can show green.

CI's Python SDK Quality (sandbox) job failed at the pyright step with 10
errors after I ignored that lint in earlier rounds. All on this PR's
new code; tests and ruff are unaffected.

### Three categories

**(1) ``timedelta | None`` not assignable to ``timedelta`` (4 errors)**

The auto-default change in 4e4f12c made ``PoolConfig.acquire_min_remaining_ttl``
declared as ``timedelta | None`` (``None`` means "auto-derive from idle_timeout"
at ``__post_init__``). Even though ``__post_init__`` always resolves it to a
concrete ``timedelta`` via ``object.__setattr__``, pyright can't see through
that, so passing it to the helper functions tripped.

Fix: widen the helper signatures to accept ``timedelta | None`` and treat
``None`` as zero. This matches what ``__post_init__`` would do anyway and
removes the need for ``cast`` at every call site.

**(2) ``_kill_sandbox_best_effort`` returns ``bool`` but reconciler wants
``Callable[[str], None]`` (2 errors)**

In 33ee6f3 ``_kill_sandbox_best_effort`` was changed to return ``bool`` so
``_kill_discarded_alive`` could fire the success debug log only on a confirmed
kill. But the reconciler's ``on_discard_sandbox`` parameter is typed
``Callable[[str], None]``, and ``bool`` is not a subtype of ``None``.

Fix: introduce a thin ``_discard_sandbox_callback`` adapter on both pool
facades that calls ``_kill_sandbox_best_effort`` and drops the return. The
reconciler now receives the adapter; the bool is still available to
``_kill_discarded_alive`` for the success-only log.

**(3) ``getattr(store, ...)`` returns ``object`` (4 errors)**

The Python ``hasattr``-based store-compat shim in ``pool_types.py`` uses
``getattr(store, "try_take_idle_min_ttl", None)``, which pyright types as
``object``. Calling that or awaiting it tripped four errors in the helpers.

Fix: ``cast`` each ``getattr`` result to the actual signature
(``Awaitable[TakeIdleResult]`` / ``Iterable[str] | None`` etc.). Behavior
unchanged at runtime — ``callable(method)`` still gates the call.

### Verification

- ``uv run pyright`` — 0 errors
- ``uv run ruff check`` — clean
- ``uv run pytest`` — 207 passed
- Kotlin ``:sandbox:test`` + ``:sandbox-pool-redis:test`` (real Redis 7) +
  ``spotlessCheck`` — all green

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@asiudgufgbukbsa
Copy link
Copy Markdown
Contributor Author

Pushed e5026d10 with the pyright fixes (CI on 8d97a64f flagged 10 typing errors in the new code — narrowed the helper signatures to accept timedelta | None, added a thin _discard_sandbox_callback adapter so the reconciler's Callable[[str], None] still type-checks while _kill_sandbox_best_effort keeps its bool return for the success-only log, and cast-ed the getattr(store, ...) results).

All local checks pass on e5026d10:

  • uv run pyright — 0 errors
  • uv run ruff check — clean
  • uv run pytest — 207 passed
  • Kotlin :sandbox:test + :sandbox-pool-redis:test (real Redis 7 via OPENSANDBOX_TEST_REDIS_URL) + spotlessCheck — all green

@Pangjiping — would appreciate workflow approval when you have a moment so the new run can show green 🙏

Copy link
Copy Markdown
Collaborator

@ninan-nn ninan-nn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Pangjiping Pangjiping merged commit 6cc55f2 into opensandbox-group:main Jun 8, 2026
20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working sdks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add minimum remaining TTL check on pool acquire to avoid dispensing near-expiry sandboxes

3 participants